AUC optimization for deep learning-based voice activity detection
نویسندگان
چکیده
Abstract Voice activity detection (VAD) based on deep neural networks (DNN) have demonstrated good performance in adverse acoustic environments. Current DNN-based VAD optimizes a surrogate function, e.g., minimum cross-entropy or squared error, at given decision threshold. However, usually works on-the-fly with dynamic threshold, and the receiver operating characteristic (ROC) curve is global evaluation metric for all possible thresholds. In this paper, we propose to maximize area under ROC (MaxAUC) by DNN, which can of terms entire curve. objective AUC maximization nondifferentiable. To overcome difficulty, relax nondifferentiable loss function two differentiable approximation functions—sigmoid hinge loss. study effectiveness proposed MaxAUC-DNN VAD, take either standard feedforward network bidirectional long short-term memory as DNN model state-of-the-art multi-resolution cochleagram Fourier transform feature. We conducted noise-independent training comparison methods. Experimental results show that taking optimization higher than common objectives error cross-entropy. The experimental conclusion consistent across different structures, features, noise scenarios, sets, languages.
منابع مشابه
A New Algorithm for Voice Activity Detection Based on Wavelet Packets (RESEARCH NOTE)
Speech constitutes much of the communicated information; most other perceived audio signals do not carry nearly as much information. Indeed, much of the non-speech signals maybe classified as ‘noise’ in human communication. The process of separating conversational speech and noise is termed voice activity detection (VAD). This paper describes a new approach to VAD which is based on the Wavelet ...
متن کاملA Hybrid Optimization Algorithm for Learning Deep Models
Deep learning is one of the subsets of machine learning that is widely used in Artificial Intelligence (AI) field such as natural language processing and machine vision. The learning algorithms require optimization in multiple aspects. Generally, model-based inferences need to solve an optimized problem. In deep learning, the most important problem that can be solved by optimization is neural n...
متن کاملA Hybrid Optimization Algorithm for Learning Deep Models
Deep learning is one of the subsets of machine learning that is widely used in Artificial Intelligence (AI) field such as natural language processing and machine vision. The learning algorithms require optimization in multiple aspects. Generally, model-based inferences need to solve an optimized problem. In deep learning, the most important problem that can be solved by optimization is neural n...
متن کاملTransfer Learning for Voice Activity Detection: A Denoising Deep Neural Network Perspective
Mismatching problem between the source and target noisy corpora severely hinder the practical use of the machine-learningbased voice activity detection (VAD). In this paper, we try to address this problem in the transfer learning prospective. Transfer learning tries to find a common learning machine or a common feature subspace that is shared by both the source corpus and the target corpus. The...
متن کاملMelanoma detection with a deep learning model
Background: Skin cancer is one of the most common forms of cancer in the world and melanoma is the deadliest type of skin cancer. Both melanoma and melanocytic nevi begin in melanocytes (cells that produce melanin). However, melanocytic nevi are benign whereas melanoma is malignant. This work proposes a deep learning model for classification of these two lesions. Methods: In this analytic s...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: Eurasip Journal on Audio, Speech, and Music Processing
سال: 2022
ISSN: ['1687-4722', '1687-4714']
DOI: https://doi.org/10.1186/s13636-022-00260-9